Goto

Collaborating Authors

 misclassification rate


Individual-heterogeneous sub-Gaussian Mixture Models

Qing, Huan

arXiv.org Machine Learning

The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.


Equality of Opportunity in Classification: A Causal Approach

Junzhe Zhang, Elias Bareinboim

Neural Information Processing Systems

Despitethis noble goal, it has been acknowledged in the literature that statistical tests based ontheEOareoblivious totheunderlying causal mechanisms thatgenerated the disparity in the first place (Hardt et al. 2016).









Discriminative classification with generative features: bridging Naive Bayes and logistic regression

Terner, Zachary, Petersen, Alexander, Wang, Yuedong

arXiv.org Machine Learning

We introduce Smart Bayes, a new classification framework that bridges generative and discriminative modeling by integrating likelihood-ratio-based generative features into a logistic-regression-style discriminative classifier. From the generative perspective, Smart Bayes relaxes the fixed unit weights of Naive Bayes by allowing data-driven coefficients on density-ratio features. From a discriminative perspective, it constructs transformed inputs as marginal log-density ratios that explicitly quantify how much more likely each feature value is under one class than another, thereby providing predictors with stronger class separation than the raw covariates. To support this framework, we develop a spline-based estimator for univariate log-density ratios that is flexible, robust, and computationally efficient. Through extensive simulations and real-data studies, Smart Bayes often outperforms both logistic regression and Naive Bayes. Our results highlight the potential of hybrid approaches that exploit generative structure to enhance discriminative performance.